{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4b5daafbac08a79e",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/MixedbreadAIRerank.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29555001ef61b56f",
   "metadata": {},
   "source": [
    "# Mixedbread AI Rerank"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84e7cd944c6dd365",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84a638b7ae22e597",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index > /dev/null\n",
    "%pip install llama-index-postprocessor-mixedbreadai-rerank > /dev/null\n",
    "%pip install llama-index-llms-openai > /dev/null"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c570bf054bffa9e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
    "from llama_index.core.response.pprint_utils import pprint_response"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30de4fe08c5f0f72",
   "metadata": {},
   "source": [
    "Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8260dc57d8861d01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2025-07-24 19:14:25--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ...\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 75042 (73K) [text/plain]\n",
      "Saving to: ‘data/paul_graham/paul_graham_essay.txt’\n",
      "\n",
      "data/paul_graham/pa 100%[===================>]  73.28K  --.-KB/s    in 0.03s   \n",
      "\n",
      "2025-07-24 19:14:25 (2.35 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87822dcbf61b0068",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from llama_index.embeddings.mixedbreadai import MixedbreadAIEmbedding\n",
    "\n",
    "# You can visit https://www.mixedbread.ai/api-reference#quick-start-guide\n",
    "# to get an api key\n",
    "mixedbread_api_key = os.environ.get(\"MXBAI_API_KEY\", \"your-api-key\")\n",
    "model_name = \"mixedbread-ai/mxbai-embed-large-v1\"\n",
    "\n",
    "mixbreadai_embeddings = MixedbreadAIEmbedding(\n",
    "    api_key=mixedbread_api_key, model_name=model_name\n",
    ")\n",
    "\n",
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()\n",
    "\n",
    "# build index\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents=documents, embed_model=mixbreadai_embeddings\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dabb021ef1c0b8cb",
   "metadata": {},
   "source": [
    "## Retrieve top 10 most relevant nodes, then filter with MixedbreadAI Rerank"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47844a96d5208b1c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.postprocessor.mixedbreadai_rerank import MixedbreadAIRerank\n",
    "\n",
    "mixedbreadai_rerank = MixedbreadAIRerank(\n",
    "    api_key=mixedbread_api_key,\n",
    "    top_n=2,\n",
    "    model=\"mixedbread-ai/mxbai-rerank-large-v1\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d3ce8019715b2525",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine(\n",
    "    similarity_top_k=10,\n",
    "    node_postprocessors=[mixedbreadai_rerank],\n",
    ")\n",
    "response = query_engine.query(\n",
    "    \"What did Sam Altman do in this essay?\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66a3b098e2612db",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Final Response: Sam Altman was asked to become the president of Y\n",
      "Combinator (YC) after the original founders decided to step back and\n",
      "reorganize the company to ensure its longevity. Initially hesitant due\n",
      "to his interest in starting a nuclear reactor startup, Sam eventually\n",
      "agreed to take over as president starting with the winter 2014 batch.\n",
      "______________________________________________________________________\n",
      "Source Node 1/2\n",
      "Node ID: 9bef8795-4532-44eb-a590-45abf15b11e5\n",
      "Similarity: 0.109680176\n",
      "Text: This seemed strange advice, because YC was doing great. But if\n",
      "there was one thing rarer than Rtm offering advice, it was Rtm being\n",
      "wrong. So this set me thinking. It was true that on my current\n",
      "trajectory, YC would be the last thing I did, because it was only\n",
      "taking up more of my attention. It had already eaten Arc, and was in\n",
      "the process of ea...\n",
      "______________________________________________________________________\n",
      "Source Node 2/2\n",
      "Node ID: 3060722a-0e57-492e-9071-2148e5eec2be\n",
      "Similarity: 0.041625977\n",
      "Text: But after Heroku got bought we had enough money to go back to\n",
      "being self-funded.  [15] I've never liked the term \"deal flow,\"\n",
      "because it implies that the number of new startups at any given time\n",
      "is fixed. This is not only false, but it's the purpose of YC to\n",
      "falsify it, by causing startups to be founded that would not otherwise\n",
      "have existed.  [1...\n"
     ]
    }
   ],
   "source": [
    "pprint_response(response, show_source=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fd6d2dbfa36548e",
   "metadata": {},
   "source": [
    "## Directly retrieve top 2 most similar nodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5477bfa95604d475",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine(\n",
    "    similarity_top_k=2,\n",
    ")\n",
    "response = query_engine.query(\n",
    "    \"What did Sam Altman do in this essay?\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "532fcad0c87faa22",
   "metadata": {},
   "source": [
    "Retrieved context is irrelevant and response is hallucinated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5354dc02c93d83d1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Final Response: Sam Altman worked on the application builder, while\n",
      "Dan worked on network infrastructure, and two undergrads worked on the\n",
      "first two services (images and phone calls). Later on, Sam realized he\n",
      "didn't want to run a company and decided to build a subset of the\n",
      "project as an open source project.\n",
      "______________________________________________________________________\n",
      "Source Node 1/2\n",
      "Node ID: a42ab697-0bd1-40fc-8e23-64148e62fe6d\n",
      "Similarity: 0.557881093860686\n",
      "Text: I started working on the application builder, Dan worked on\n",
      "network infrastructure, and the two undergrads worked on the first two\n",
      "services (images and phone calls). But about halfway through the\n",
      "summer I realized I really didn't want to run a company — especially\n",
      "not a big one, which it was looking like this would have to be. I'd\n",
      "only started V...\n",
      "______________________________________________________________________\n",
      "Source Node 2/2\n",
      "Node ID: a398b429-fad6-4284-a201-835e5c1fec3c\n",
      "Similarity: 0.49815489887733433\n",
      "Text: But alas it was more like the Accademia than not. Better\n",
      "organized, certainly, and a lot more expensive, but it was now\n",
      "becoming clear that art school did not bear the same relationship to\n",
      "art that medical school bore to medicine. At least not the painting\n",
      "department. The textile department, which my next door neighbor\n",
      "belonged to, seemed to be ...\n"
     ]
    }
   ],
   "source": [
    "pprint_response(response, show_source=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
